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ABSTRACT 
Online reviews have become a popular method for consumers to express personal 
evaluation about products. Ecommerce firms have invested heavily into review systems 
because of the impact of product reviews on product sales and shopping behavior. 
However, the usage of product reviews is undermined by the increasing appearance of 
shill or fake reviews. As initial steps to deter and detect shill reviews, this study attempts 
to understand characteristics of shill reviews and influences of shill reviews on product 
quality and shopping behavior. To reveal the linguistic characteristics of shill reviews, 
this study compares shill reviews and normal reviews on informativeness, readability and 
subjectivity level. The results show that these features can be used as reliable indicators 
to separate shill reviews from normal reviews. An experiment was conducted to measure 
the impact of shill reviews on perceived product quality. The results showed that positive 
shill reviews significantly increased quality perceptions of consumers for thinly reviewed 
products. This finding provides strong evidence about the risks of shill reviews and 
emphasizes the need to develop effective detection and prevention methods. 
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1. Introduction 

1.1 Overview 

Buyers use online reviews as a source of knowledge about products they want to 
buy. The knowledge contained in the reviews reflects personal experiences of the 
reviewer. These experiences may provide consumers with additional information not 
mentioned in the official product description or allow them to verify that the information 
advertised by the manufacturer is accurate. The information in product reviews can be 
used to overcome the problem of information asymmetry, that is exacerbated in online 
sales environment (sellers possess more product information than buyers) (Ba and 
Pavlou, 2002). Thus, reviews help online buyers make more informed purchase 
decisions. For example, a study about video game buyers shows that purchase decisions 
were positively influenced by the usage of online reviews (Bounie, Bourreau, Gensollen 
and Waelbroeck, 2005). Because reviews affect buyers' purchase decisions which 
directly impact product sales, there is motivation for sellers to use fake reviews to 
provide the buyers with misleading or incorrect product information. 

In this study we regard fake reviews as "shill reviews". The term "shill" and 

"shilling" are used in studies about reputation manipulation. Lam and Riedl (2004) 

defines shills as users "whose false opinions are intended to mislead other users". We 

extend this definition by specifying that a shill is a person who writes a review for a 

product without disclosing the relationship between the seller and review writer. A shill 

can be the seller or someone compensated by the seller for writing a review. Thus, shills 

can be sellers, distributors, manufacturers and authors who benefit from the sales of the 
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product. Wu, Greene, Smyth and Cunningham (2010) defines "shill reviews" as reviews 
that "distort popularity rankings given that the objective is to improve the online 
reputation". By the definition above, any review can potentially be a shill review making 
it very difficult to detect shill reviews. 

Despite the difficulty in detecting shill reviews, some anecdotal evidence has 
emerged about their existence. Review manipulation was found even on reputable online 
marketplaces such as Amazon.com and BarnesandNoble.com (Hu, Bose, Gao and Liu, 
2011a; Hu, Liu and Sambamurthy, 2011b). The review system on Google was also 
attacked. An investigation by Denver 7News channel discovered that a woman was hired 
to create more than 50 Google accounts to publish 5-star reviews for multiple local 
businesses 1 . BBC News reported that Gary Beal, a business owner, was a victim of 
review manipulation . Gary found that a local competitor posted a negative review about 
his company in order to damage his reputation and steal his customers. In 2009, Belkin, a 
networking and peripheral manufacturer was reported hiring people to write fake positive 
reviews for their products on Amazon.com . Later, Belkin management issued an 
apology for this action 4 . In the music industry, marketers disguised as consumers, 
promoted newly released CDs on online communities such as discussion forums or fan 
sites (Mayzlin, 2006). According to Gartner, an IT research and advisory company, by 
the year 2014, 10-15 % of media reviews will be fake reviews . 



1 http://www.thedenverchannel.eom/news/3 1087210/detail. html 
http://news.bbc.co.Uk/2/hi/programmes/click_online/8826258.stm 
http://www.thedailybackground.com/2009/01/16/exclusive-belkins-development-rep-is- 

hiring-people-to-write-fake-positive-amazon-reviews/ 

4 http://news.cnet.com/8301-1001_3-10145399-92.html 

5 http://www.gartner.com/it/page.jsp?id=2161315 



There are several factors that allow shill attacks to be effective. First, the most 
important part of a product review is its overall rating. In current review systems, the 
overall rating of a product is the simple average value of all of its reviews. So a direct 
way to impact the average rating of a product is to simply submit a review. The fewer 
reviews a product has, the more impact a new review has on the overall rating. Therefore, 
thinly reviewed products, such as new products or specialized products, can be benefit 
from shill attacks. Second, it is very simple to submit a review for a product. Normally, 
an account is required for a reviewer to submit a review, but the account registration 
process usually only requires the reviewer to have an email address, which can easily be 
obtained for free. Third, the identification of reviewers is usually anonymous so that 
reviewers don't have to be responsible for the content of their reviews. Finally, unlike 
reviews for sellers, product reviews can be submitted by reviewers who are not required 
to demonstrate product ownership. 

1.2 The linguistic characteristics of shill reviews 

Although the existence of review manipulation is known, researchers are having 
difficulty developing effective methods to detect fake reviews and measuring the impact 
of shill reviews on the consumers. Research efforts were made to identify product groups 
whose reviews are more likely to be manipulated (Hu et al., 2011a; Hu et al., 2011b). 
However, the results of these studies have been limited to verifying the existence of 
review manipulation instead of identifying the fake reviews. It is difficult to specifically 
identify fake reviews even when a fake review identification process is done manually 
(Jindal and Liu, 2007). We argue that to effectively detect fake product reviews, better 



understanding about the linguistic characteristics of fake reviews must be developed. In 
this study, we explore the linguistic characteristics, such as informativeness, subjectivity 
and readability, of fake reviews by comparing their text comment to that of the normal 
reviews using natural language processing (NLP) techniques. 

A comparison between shill reviews and normal reviews reveals the characteristic 
of shill reviews. To measure informativeness of a review, a novel method integrating 
multiple NLP techniques was developed to extract product features included in the 
content of product reviews and classify them into official and unofficial reviews. 
Subjectivity reflecting product usage experience is measured using the subjectivity 
detection approach suggested by Pang and Lee (2004b). Readability, often used to 
detect text deception, was measured by five popular readability measures. The 
measures of the linguistic characteristics of shill and normal reviews are compared 
using independent samples T tests. 

The results of the feature extraction method give useful information about 
the official features and unofficial features discussed in the reviews. This method 
can be used to improve review summarization tasks. Comparing these product 
features shows that the linguistic characteristics can be used as separators to 
differentiate shill reviews from normal reviews. This finding indicates that 
informativeness, subjectivity and readability can be used as factors to create an 
effective shill review detection method. 



1.3 The impact of shill reviews on perceived quality 

Despite the impact of reviews on purchase decisions found in multiple studies 
(Bounie et al., 2005; Chevalier and Mayzlin, 2006), there has been no clear explanation 
for the underlying reason of this relationship. We argue that the consumers use online 
reviews to gain trust about products. Product reviews have become an important source 
of product information (Urban, 2005). Consumers use reviews to verify the quality of the 
product advertised by the manufacturer. Thus, consumers use reviews to confirm their 
perception about product quality (Moe, 2009). Since perceived quality plays an important 
role in consumer's purchase decision making process (Tsiotsou, 2005; Zeithaml, 1988), 
reviews indirectly impact customer purchase decisions. The purpose of shill reviews is to 
change the consumers' perception about the quality of target products. The objective of 
this study is to measure the impact of product reviews on perceived quality and the effect 
of positive shill reviews on improving quality perceptions. 

Following Zeithaml' s definitions, product reviews can be treated as an extrinsic 
attribute that has an impact on the reputation of the product (Zeithaml, 1988). We extend 
Zeithaml' s model by hypothesizing that customer reviews can also impact perceived 
quality. To isolate the effect of the reviews, other factors such as price and brand names 
in this model are controlled. Different sets of reviews contain different numbers of 
positive shill reviews. An advantage of collecting data via an experiment is the ability to 
monitor review usage, such as quantity of reviews read and time spent on reading the 
reviews, which are difficult to observe in real world environment. 



Our results reveal some interesting characteristics of the relationship between 
product reviews and review usage. The first impression is only influenced by average 
rating among the variables included in the rating summary. When there are ten reviews or 
less, consumers tend to read all the reviews. However, the results show when more 
reviews are available, consumers spend less time reading each review. The results also 
show that shill reviews have a significant effect on changing product quality perception. 
The appearance of shill reviews increases perceived product quality. 

The findings of this study have both theoretical and practical contributions. 
Theoretically, we identify that customer reviews is one of the factors that has an impact 
on perceived quality. We provide evidence that word-of-mouth, in online shopping 
environment, has shown the influence on perceived product quality. In practice, 
marketers can use online review as a tool to improve quality perception, especially in 
cases where advertising doesn't effectively do so (Clark, Doraszelski and Draganska, 
2009). Our findings about the effect of shill reviews also help to raise awareness about 
the risks posed by shill reviews. 



2. Literature Review 

2.1 The effect of product reviews 

Literature in economics and marketing has shown that online product reviews are 
used widely by consumers to make both online and offline purchases (Bansal and Voyer, 
2000; Chatterjee, 2001; Godes and Mayzlin, 2004). The information contained in product 
reviews helps consumers gather useful information about products they intend to 
purchase. For instance, gamers between 19 and 25 who read more online video game 
reviews tend to purchase more video games (Bounie et al., 2005). The additional product 
information provided by the reviews also helps consumers mitigate the problem of 
information asymmetry and therefore, increase their confidence in making the purchase 
decision which has direct impact on product sales (Ba et al., 2002; Infosino, 1986). 

Table 2.1 summarizes studies that show the important role of online product 
reviews on product sales and buyer behaviors. Using movie box office data, Liu (2006) 
and Duan, Gu and Whinston (2008) showed that the quantity of reviews positively 
impacts movie revenue. These two studies found no effect on the average rating on movie 
sales. On the other hand, a significant effect of average rating of the reviews on product 
sales was found in other studies (Chevalier et al., 2006; Cui, Lui and Guo, 2010; 
Dellarocas, Zhang and Awad, 2007; Ye, Law and Gu, 2009). Explaining the effect of the 
rating score on product sales, Forman, Ghose and Wiesenfeld (2008) stated that 
consumers use ratings as a measurement for product quality. The authors also believed 
that a good rating can draw the attention of the buyers and lead to a buying decision. 



Table 2.1 Literature summary about the impact of product reviews 



Article 


Product category 


Dependent 
variable 


Significant WOM effects 


Liu (2006) 


Movies 


Sales 


Number of posts 
(Volume) 


Duan et al. (2008) 


Movies 


Sales 


Number of posts 
(Volume) 


Dellarocas et al. (2008) 


Movies 


Sales diffusion 
parameters 


Number of posts 
(Volume) 


demons et al. (2006) 


Beer 


Sales growth 
rate 


Average rating (Valence) 
Standard deviation 
(Variance) 


Godes & Mazilyn 
(2004) 


Television shows 


TV viewership 
ratings 


Entropy of post (Variance) 
Number of posts 
(Volume) 


Chevalier & Mayzilin 
(2006) 


Books 


Sales rank 


Average rating (Valence) 


Park et al. (2007) 


Portable 
multimedia player 


Intention to buy 


Quality of the reviews 
Number of posts 
(Volume) 


Bounie et al. (2005) 


Video games 


Intention to buy 


Review usage 



Studies about the effect of review rating score find that the effect of low-end (1,2 
star) and high-end (4,5 star) reviews on sales depends on the characteristics such as price 
premium of the product (Chevalier et al., 2006; Clemons, Gao and Hitt, 2006). However, 
comparing to medium rating, strong ratings which are low-end or high-end appear to be 
more attractive. According to (Forman et al., 2008), reviews with strong rating provide "a 
great deal of information to inform purchase decision". Supporting this argument, Cao, 
Duan and Gan (2011) concluded that reviews with extreme opinions receive more 
helpfulness votes than reviews with neutral or mixed opinions. 

The rating score is not the only element of the review that has an impact on 
consumers. The variance in ratings among the reviews of a product also impacts the 
product's market performance (Awad and Zhang, 2006). Sun (2008) found that even 
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when a product has a low rating score, some of its high rating reviews which create high 
variance in the review set can increase the sales of that product. The existence of the high 
ratings means that the product is still appreciated by several users and the potential 
consumers may shift their focus to these high rating reviews. Other factors such as 
reviewer identity also play an important role in the influence of the reviewer on the 
consumers. Reviews with enclosed reviewer identity receive more helpfulness votes from 
consumers and enjoy an increase in sales (Forman et al., 2008). Similarly, Hu, Liu and 
Zhang (2008) concluded that "the market responds more favorably to reviews written by 
reviewers with better reputation and higher exposure". 

The content of the text comment of the reviews reportedly has influence on 
product sales. Ghose and Ipeirotis (2004) reported that the subjectivity level of reviews 
positively impacted the sales of electronic items. The sentiment in the text comment 
expresses the attitude of the reviewers toward the quality of the product. This piece of 
information is important to consumers because they want to know what others think 
about the product (Pang and Lee, 2008). In other words, reading the reviews, the 
consumers are looking for not only the confirmation of product features but also reviewer 
personal feeling when they use the product. In addition, the sentiment of early reviews 
impacted the sentiment of later reviews and indirectly affected the overall reputation of 
the product (Gao, Gu and Lin, 2006; Sakunkoo and Sakunkoo, 2009). 



2.2 Review manipulation 

Two common challenges review systems face are a lack of incentive to leave 
feedback and the existence of dishonest feedback (Resnick, Zeckhauser, Friedman and 
Kuwabara, 2000). Leaving a detailed feedback is a time consuming process. Many buyers 
do not bother to leave feedback if there is no a reward for doing so (Gao et al., 2006). 
Lack of feedback can leave products thinly reviewed and susceptible to review 
manipulation (Prawesh and Padmanbhan, 2012). Another issue of review systems is the 
existence of shill reviews, which threaten the effectiveness of the review systems. Shill 
reviews not only hurt consumers by tricking them to buying a product, but also hurts both 
honest and dishonest sellers. If a shill attack is successful, honest sellers can't sell their 
products and they will be eliminated from the market. The market will be filled with 
lemon products and eventually could collapse (Akerlof, 1970). 

Several attempts have been made to provide evidence about the existence of 

review manipulation. Hu et al. (2011a) views review manipulation as review 

management which they define as "vendors, publishers or writers consistently monitoring 

consumer online reviews, posting non-authentic messages to message board, or writing 

inflated online reviews on behalf of customers when needed, with the goal of boosting 

their product sales, in the online review context". By exploring book reviews on 

Amazon.com, the authors revealed that review manipulation does exist with several 

groups of books namely non-bestseller books, popular and high-priced book and books 

whose reviews have high divergence in helpfulness votes. Also using the reviews on 

Amazon.com as the sample, Jindal et al. (2007) found that the problem of review 

manipulation is wide-spread. After examining over 5.8 millions reviews on Amazon, the 
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authors found "a large number of duplicate or near duplicate reviews written by the same 
reviewers". Liu estimates that about 30% of online reviews are fake reviews. Another 
study found that 10.3% of the products on Amazon.com are subject to online review 
manipulation (Hu, Bose, Koh and Liu, 2012). 

The downsides of review manipulation are that the action often has no effect and 
can be very costly if caught (Dellarocas, 2004). As an example, the Huffington Post 
reports that "Bestselling, award-winning crime author R.J. Ellory was caught faking 
Amazon reviews for both his own books and the books of his competitors" . The author 
later issued an apology for this action. Such negative publicity has the potential to create 
long term damage to the reputation of the person caught faking reviews, potentially 
causing online stores to refuse to sell the product or consumers to be reluctant to purchase 
it - both of which could reduce revenue for the product being sold. As another example, 
Legacy Learning Systems was fined $250,000 by the federal trade commission (FTC) for 

"7 

hiring affiliate marketers to write positive reviews . The U.S. Federal Trade Commission 
caught Reverb Communications, a public relations firm, posting phony positive reviews 
on iTunes without revealing it was being paid to do so 8 . Despite the existence of review 
manipulation, there has been little research to understand or ameliorate it (Dellarocas, 
2004; Hu et al., 2011a; Mayzlin, 2006). 

The difficulties of review manipulation research are ineffective detection methods 
and lack of labeled manipulated reviews (Hu et al., 2011a). There are several approaches 



http://www.huffingtonpost.com/2012/09/04/rj-ellory-fake-amazon-reviews- 
caught_n_l 8547 1 3 .html 

7 http://ftc.gov/opa/201 1/03/legacy.shtm 

8 http://www.inc.com/news/articles/2010/08/ftc-settles-case-over-fraudulent-reviews.html 
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to prevent review manipulation. One approach is that websites can encourage more 
reviews to be submitted making it more difficult to change the average rating (Dellarocas, 
2004). To encourage review submission, websites usually give some kinds of rewards to 
reviewers . Another less common approach is to limit name changes by allowing the user 
to commit to their identification or charging entry fee (Friedman and Resnick, 2001). 
This reduces shilling because on other sites it usually costs almost nothing to create a 
user account allowing shills to easily publish multiple reviews under different identities 
without any consequences. An alternative solution Amazon.com utilizes is to increase the 
credibility of the reviews by providing certifications such as publishing the reviewer's 
real name or indicating that the review was written by an Amazon verified consumer. 

Besides prevention of review manipulation, three categories of shill review 
detection have been developed to reduce the prevalence of shill reviews: review-centric, 
reviewer-centric and item-centric. The review-centric approach detects reviews submitted 
multiple times to multiple products. The reviews are then used to train a shill review 
classifier (Jindal et al., 2007). The drawback of this approach is that not all shill reviews 
are duplicate. The reviewer-centric approach can overcome this drawback by analyzing 
the rating behaviors of individual reviews to identify suspicious reviewers (Lim, Nguyen, 
Jindal, Liu and Lauw, 2010). For example, reviewers who submit similar reviews for 
many products are considered suspicious allowing the reviews to be flagged as spam. The 
primary drawback to the reviewer-centric approach is that it is ineffective when reviewers 
use multiple identities. The item-centric approach focuses on analyzing the review set for 



9 Multiple websites such as Epinion.com and Ciao.co.uk reward its member for writing reviews. 
Amazon.com recognizes the effort of posting helpful reviews of the reviewers by creating lists such as 
Amazon's Top Customer Reviewers and Hall of Fame Reviewers. 
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an item (Wu et al., 2010). The analysis determines if a group of reviews is removed from 
the review set and the ranking of the product significantly changes, those reviews might 
be spam reviews. The problem with this approach is that it assumes that there will be 
homogeneity among reviews and has the potential to eliminate normal reviews - 
especially for products with wildly varied customer opinions. 

2.3 Product feature extraction 

There are multiple methods to extract product features from the text comment of 
the reviews (Abulaish, Jahiruddin, Doja and Ahmad, 2009; Archak, Ghose and Ipeirotis, 
2007; Liu, 2010). A sequential rule based method was used to extract product features 
(Liu, 2005). This method generates a set of rules about the location of product features in 
a statement. Then, all the statements are matched with that set of rules and the feature can 
be located. Hu and Liu (2004) and Dave, Lawrence and Pennock (2003) used statistical 
patterns to detect the product features. First, POS tagging is used to identify nouns and 
noun phrases from the reviews. The nouns or noun phrases which appear multiple times 
are classified as frequent features. A feature pruning process is used to eliminate 
redundant frequent features. The frequent pruning method above can be improved by 
calculating the Pointwise Mutual Information (PMI) score between the phrase and 
meronymy discriminators associated with the product class (Popescu and Etzioni, 2005). 

2.4 Perceived quality 

There are two different kinds of quality: objective quality and perceived quality. 

Objective quality is defined as "the technical superiority or excellence of the product" 

(Zeithaml, 1988). Objective quality is the true quality of the product and is often stable 
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(Clark et al., 2009). Objective quality of the product is observable and can be measured 
by predetermined standards. According to the literature, customer reviews can't reveal 
product objective quality because of biases and influences such as self-selection bias and 
culture influence (Hu, Pavlou and Zhang, 2006; Koh, Hu and Clemons, 2010; Moe and 
Trusov, 2011; Schlosser, 2005). Objective quality is not the target of this study. 

Perceived quality is defined as "the consumer's judgment about a product's overall 
excellence or superiority" (Zeithaml, 1988). Perceived quality is not the same as the 
objective quality of the product. It is what the consumers think the quality of the product 
might be. Perceived product quality is an important factor that impacts consumer 
behaviors such as intention to buy or product selection (Jacoby, Chestnut, Hoyer, Sheluga 
and Donahue, 1978; Sawyer, 1975; Tsiotsou, 2005). Therefore, one way to influence 
purchasing behavior is to influence perceived product quality. Shill reviews are used to 
change the perceived quality judgments of potential customers. The objective of this 
study is to measure how effective the shill reviews are in accomplishing this task. 
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3. Linguistic Characteristics of Shill Reviews 

3.1 Development of hypotheses 

To understand the characteristics of shill reviews, we examine the differences 
between shill and normal reviews. The main differences between shill reviewers and 
normal reviewers are reflected by reviewers' knowledge about the product and product 
usage experience. We assume that shill reviewers have never used the target product, a 
reasonable assumption since it is too costly to send most products to shill reviewers and 
to compensate shill reviewers for the time necessary to actually evaluate the product. 

Informativeness of a review is defined as the amount of product information 
provided in the review (Liu, Cao, Lin, Huang and Zhou, 2007). Product information is 
represented by product features mentioned in the review. Product features are divided 
into two categories: official features and unofficial features. An official feature is a noun 
or a noun phrase about the product which is included in the official product description. 
Official features are usually the product information that a consumer sees when reading 
the description of the product. An official feature is public information which is usually 
provided by the manufacturer of the product. An unofficial feature is also a noun or a 
noun phrase about the product. However, unofficial features are not a part of the product 
description. An example of an unofficial feature is the word "case" which might not be 
mentioned in the product description but described in the reviews as an accessory that 
comes with the device. Hence, unofficial features are private information known only to 
users of the product. 
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Product features in normal reviews may differ than those found in shill reviews. 
Expectancy theory posits that "the motivation force experienced by an individual to select 
one behavior from a larger set is some function of the perceived likelihood that that 
behavior will result in the attainment of various outcomes weighted by the desirability of 
these outcomes to the person" (Oliver, 1974). Because the reward from the act of writing 
a shill review is not high, expectancy theory suggests that shill reviewers will not spend 
time looking for additional information about the product but rather use the readily 
available information provided by the product descriptions when writing their reviews. 
This assumption is consistent with a study about criminal behavior which found that the 
amount of reward from a criminal act significantly impacts the intensity of the criminal 
activity (Viscusi, 1986). Shill reviewers are unlikely to know about the unofficial features 
of the product and consequently their reviews will contain fewer unofficial product 
features and more official features. Thus, we hypothesize that: 

HI a: Shill reviews contain more official features per sentence than normal 



reviews. 



Hlb: Shill reviews contain fewer unofficial features per sentence than normal 
reviews. 

Hlc: The percentage of sentences containing official features in shill reviews is 
higher than that of normal reviews. 

Hid: The percentage of sentences containing unofficial features in shill reviews is 
lower than that of normal reviews. 
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Product usage experience is measured using the subjectivity and objectivity of the 
sentences in the reviews. A subjective sentence "gives a very personal description of the 
product" and an objective sentence "lists the characteristics of the product" (Ghose et al., 
2004). An example of a subjective sentence can be "It's really a great little player". An 
example of an objective sentence can be "It even includes a computer USB interface and 
built-in speaker". Knapp, Hart and Dennis (2006) stated that liars usually avoid 
statements of ownership because of lack of personal experiences. Agreeing with this 
argument, Newman, Pennebaker, Berry and Richards (2003) showed that one of the 
important factors that distinguish deceptive sentences from other sentences is self- 
reference. 

The findings suggest that shill reviewers will avoid subjective statements in their 
reviews because they have never actually used or owned the product. Instead, they are 
more likely to focus on describing the product. In contrast, since normal reviewers have 
used the product, they have the experience using the product and will be confident in 
expressing their feelings about the product they used. So, normal reviews are expected to 
include more subjective sentences than shill reviews. We hypothesize that: 

H2: Shill reviews are less subjective than normal reviews. 

Readability can be another measure to compare shill and normal reviews. 
Readability is defined as the cognitive effort required for a person to comprehend a piece 
of text (Zakaluk and Samuels, 1988). Readability is usually measured by the length of the 
text, the complexity of the words and number of sentences. Readability characteristics 
have been used as linguistic cues to detect text deception (Afroz, Brennan and 

Greenstadt, 2012; Daft and Lengel, 1984). Moffitt and Burns (2009) finds that fraudulent 
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financial reports usually contain more complex words making them less readable than 
truthful ones. Reviewing the literature, Vartapetiance and Gillam (2012) suggests that 
texts that are less readable are likely to be deceptive. Thus, we hypothesize that: 

H3: Skill reviews are less readable than normal reviews 

3.2 Data Collection 

To explore the linguistic characteristics of shill reviews, a collection of shill and 
normal reviews are required. Shill reviews are reviews submitted by shills who have 
undisclosed relationship with the seller. A normal review is free of undisclosed 
relationships between seller and reviewer unlike shill reviews. The collected shill and 
normal reviews are compared together to reveal their differences in informativeness, 
subjectivity and readability. 

3.2.1 Shill review collection 

Reputation manipulation related studies require a dataset of labeled shill reviews. 
It is difficult to obtain the labeled shill review dataset from publicly available reviews 
because there is no effective method to classify them as shill reviews. Several studies 
have collected duplicate reviews and label them as "spam reviews" (Jindal et al., 2007; 
Jindal and Liu, 2008). While this approach is appropriate for some research about shill 
reviews, it is not appropriate for this study because a sufficient quantity of reviews for a 
specific product is required. Due to this challenge, shill reviews must be collected as 
primary data. 

In this study, shill reviews were collected via a data collection procedure in which 

the subjects were asked to become shills and intentionally write positive reviews for an 
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MP3 player. The participants were undergraduate students at the University of Colorado 
Denver. Undergraduate students, a convenience population, were chosen for this study 
because they are active technology and internet users. The students received some course 
credit as a reward for the writing the shill reviews. To increase the quality of the reviews, 
we offered a chance to win a $20 gift card for five reviewers whose reviews are in the top 
five of most helpful reviews. Each participant could submit more than one shill review. 

To simulate real conditions for writing shill reviews, the product information 
available to the subjects was limited. The subjects were provided with the product 
specifications and two pictures of the product. To ensure that the reviewers would not 
seek the product's information or its reviews online, product identification information 
such as brand name, product name and model number were changed. The price of the 
product was also hidden from the reviewer. With the provided information, the subjects 
were asked to rate the product and write a short review title and a text comment. 
Although the shill reviewers were asked to submit positive shill reviews, the reviewers 
were not given specific instructions about review content, structure, and format. The 
subjects had no specified time limits to write the reviews. 
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3.2.2 Normal review collection 

While the shill reviews were collected as primary data, normal reviews were 
collected on Amazon.com. Although there is no foolproof way to verify the lack of 
undisclosed relationships underlying the reviews collected, risk was reduced by only 
including reviews that either disclosed the reviewer's name or were Amazon.com verified 
purchasers. Shill reviewers typically will not disclose their real names in shill reviews 
because of the risk of losing reputation. It is also unlikely for a shill reviewer to actually 
buy the product just to have the "Amazon.com verified purchase" badge because it will 
increase the cost of the shill reviews submitted. 

3.3 Linguistic characteristics 

The linguistic characteristics analyzed in this study are informativeness, 
subjectivity and readability. The informativeness of the reviews is measured by the 
quantity of the product features included in a review. Product features are extracted by 
feature extraction methods. In the following subsection, the background on current 
approaches to extract product features is discussed and the Description-based Feature 
Extraction Method (DFEM) is described. DFEM integrates existing text mining tools to 
capture features from product reviews. 
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3.3.1 Product feature extraction background 

Liu (2010) divides product features into two categories: explicit feature and 
implicit feature. Explicit feature is usually described by a noun or a noun phrase. For 
example, in the sentence, "The touch screen of this MP3 player is very sensitive", the 
explicit product feature mentioned about is the "touch screen". The implicit feature does 
not mention a product feature directly. However, a explicit feature can be inferred from 
the implicit features. Implicit feature can be in any form. For example, bad durability can 
be inferred from the sentence, "This camera dies after 3 days of use" . Another challenge 
for the feature extraction method is that some features are context-dependent. For 
example, the word "pen" may not be a feature of a TV set but can mean a "stylus" of a 
touch screen MP3 player. 

According to Liu (2010), there are two popular formats of the product reviews. 
Format 1 includes a list of pros/cons at the beginning followed by the explanation text. 
Format 2 only includes the explanation text. The difference between format 1 and 2 is the 
pros/cons section. According to the author, there should be different method to handle 
that section. Figure 3.1 shows the example of Format 1 and Format 2. 
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Format 1 

Pros: 

Portability 
Sound Quality 
Expandable Memory 

Cons: 

Battery Life could be better 

Included headphones definitely do not do the player justice. 

While the smart phone replaces most mp3 players, would you want to drop your 500-700 
dollar phone while running or at the gym and risk breaking it? Enter the Clip Zip. For a 
mere 100 dollars you can have a 36gb mp3 (4gb + 32gb microSD card) player that is the 
size of a book of matches, and has amazing sound quality. The improvements over the 
clip+ include a color screen for album art, support for your AAC files, and alphabetical 
browsing... 
Format 2 

I purchased a Clip+ Plus over a year ago and have enjoyed so much that I purchased a 
Clip Zip as a backup as I never wanted to be without my portable music. 

At first glance it seems to be a slightly upgraded model with a color screen and a stop 
watch. Which just about covers the main changes. 

The problem comes when you start loading it with music and play lists. Just like the Plus 
model it supports external memory cards up to 32gb. However, unlike the Plus model, the 
Zip model does NOT support play lists stored on the external memory card. As I see it 
this makes the memory card a useless waste of space, no way I'm going to navigate 32gb 
of songs one at a time. 

Figure 3.1 Examples of review format 

Liu (2005) uses the sequential rule based method to extract the features from the 
pros/cons section of Format 1 . The basic idea behind this method is to generate a set of 
rules about where the product feature might be located in a statement. Then, all the 
statements will be matched with that set of rules and the feature can be located. The 
strength of this method is that it can detect not only explicit features but also implicit 

features. 
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For the second format, there are multiple methods to extract product features from 
a review (Abulaish et al., 2009; Archak et al., 2007). Hu et al. (2004) and Dave et al. 
(2003) uses statistical pattern to detect the product features. First, Part-of-Speech tagging 
is used to identify nouns and noun phrases from the reviews. The nouns or noun phrases 
which appear multiple times are classified as frequent features. The feature pruning 
process is in place to get rid of redundant frequent features. Then, the sentiment 
adjectives associated with the retrieved frequent features are identified. These sentiment 
words are then used to detect the infrequent features. Popescu et al. (2005) improves the 
frequent pruning method above by calculating the Pointwise Mutual Information (PMI) 
score between the phrase and meronymy discriminators associated with the product class. 
For more information about this method, see (Popescu et al., 2005). 

3.3.2 Informativeness 

The goal of DFEM is to identify product features mentioned in product reviews 
and classify them into official features and unofficial features. Feature detection is 
context dependent because a term can have different meanings in different contexts. A 
noun or noun phrase might describe one feature of a product category but not the features 
of other products. For example, the word "note" might involve the ability to record voice 
note of an MP3 player, but the same word doesn't describe a feature of a coffee-maker. 
Although DFEM can be a 100% automatic method, an optional manual step can improve 
its classification accuracy. The DFEM uses basic NLP techniques such as POS tagging, 
sentence separator, approximate matching, word stemming and spell checking to 
preprocess the reviews and use the publicly available product description to filter the 

features (Manning and Schutze, 1999). 
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Figure 3.2 gives an overview 
of the product feature extraction 
approach used in DFEM. Given the 
target product and product category, 
DFEM first collects the target product 
technical description. Then, it crawls 
to get all reviews of the target 
product, the technical description and 
reviews of all products in the same 
category as the target product. After 
that, the reviews of the target product 
are preprocessed for POS tagging. 



Crawl reviews of target of 
product 



Target product technical 
description 



Target product reviews 




Technical description of 
products in target category 



Reviews of products in 
target category 



Unofficial 

features 



Next, the nouns and noun phrases 

Figure 3.2 The Description-based Feature 
extracted from the reviews of the Extraction Method 

target product are compared with ones in the target product technical description. If the 

term is a part of the product description, it is classified as official feature. The terms 

which do not appear in the product description go through a filtering process that uses the 

technical description of other products in the same category to identify which terms 

represent unofficial features of the product (as opposed to simply nouns that are unrelated 

to the product category). 

The data collected from the review crawling process is sufficient for our study. 
226 reviews were collected for the target product, an off brand MP3 player. Table 3.1 
shows the distribution of target product reviews. The four-star and five-star reviews were 
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used to create the positive normal review dataset which will be compared with the 
positive shill review dataset. The one, two and three-star reviews were not used in the 
dataset in this study. 

Table 3.1 Distribution of target product reviews 



Rating Quantity 


-k&terb 


69 


**&&&■ 


27 


wtctoi?£r 


25 


AAAflr£r 


56 


■flrKKWsr 


49 



All the data related to the review (except for the username of the reviewer) were 

collected from Amazon.com. 

Table 3.2 shows the fields in the review table covering both target product 
reviews and other products in the MP3 category. The fields RealName and Verified are 
used to verify the authenticity of the review. The review crawling process yielded the 
description of 2,225 MP3 products with 68,981 reviews. The product descriptions were 
used to filter official and unofficial features and the product reviews were used to check 
its popularity level. 339 products were eliminated because they did not include product 
descriptions. In addition, multiple products from the same manufacturer have the same or 
similar product description which could cause a problem when they are used to filter 
unofficial product features. These products were not eliminated because they have 

different customer reviews. 
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Table 3.2 Data structure of a review 





Field Datatype Description 


ReviewID 


String 


Review ID is an combination between 
Amazon.com ASIN code and the 
position of the review 


Rating 


Integer 


The rating of the review 


Title 


String 


The title of the review 


Comment 


String 


The text comment of the review 


Helpfulness rating 


Ratio 


The helpfulness vote of the review. 
Although this info with a String data 
type, it is stored in the format of # of #. 
For example, 7 of 8 means 7 out of 8 
shoppers consider the review as helpful. 


ASIN 


String 


The ASIN number of the product. ASIN 
number is the private product ID on 
Amazon.com 


ReviewDate 


Date/Time 


The date the review was submitted 


RealName 


Boolean 


Yes: Reviewer real name is disclosed 
No: Review real name is not disclosed 


Verified 


Boolean 


Yes: The reviewer purchase is verified 
No: The reviewer purchase is not 
verified 



To identify the product features, one important task is to classify the type of each 
word in a sentence. The reviews must be broken into sentences before words are 
classified. After the reviews are broken into sentences, the sentences are then tokenized. 
These tokens are the inputs for the POS tagging tool. We used the POS tagging tool from 
the OpenNLP toolkit 1 . Table 3.3 shows an example of the POS tagged sentence. In this 
example, the first row contains the tokens, the second row contains the POS tags and the 
third row contains the chunk tag. "grandson" is identified as a noun and "bought" is a 



verb. For the full explanation of word type abbreviations, go to Penn Treebank II Tags 



n 



10 



http://opennlp.apache.org/documentation/L5.2- 
incubating/manual/opennlp.html#tools.postagger 



n 



http://bulba.sdsu.edu/jeanette/thesis/PennTags.html 
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The next step involves word phrase identification. The Chunker tool from OpenNLP was 
used for phase identification. The Chunker identifies "a birthday gift" and "our grandson" 
as noun phrases. 

Original sentence: We bought this for a birthday gift for our grandson. 

Table 3.3 Example of POS Tagging 



We 


bought 


this 


for 


A 


birthday 


gift 


for 


our 


grandson 




PRP 


VBD 


DT 


IN 


DT 


NN 


NN 


IN 


PRP$ 


NN 




B-NP 


B-VP 


B-NP 


B-PP 


B-NP 


I-NP 


I-NP 


B-PP 


B-NP 


I-NP 






After the nouns and noun phrases are identified they are pre-processed. The pre- 
processing step involves removal of stopwords, spell checking, singularizing, word 
stemming and approximate matching. The preprocessing step will produce two lists of 
terms: product technical description terms and reviews terms. To ensure that all official 
terms are detected, it is necessary to find the synonyms of term currently on the list. For 
example, the term "headphone" and "earphone" refer to the same product feature. So if 
the word "headphone" is already in the list, the synonym generator will add the word 
"earphone". In this study, we used the SynSet tool 12 to find the synonyms of given 
words. Since finding synonyms is context-dependent, a general tool can't find a complete 
list of synonyms. To compensate, an additional manual step generated terms missing 
from the list generated by the synonym generating tool. This step should increase the 
accuracy of this classification method. 



12 



http://lyle.smu.edu/~tspell/jaws/doc/edu/smu/tspell/wordnet/impl/file/synset/package- 
summary.html 
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3.3.2.1 Official feature detection 

By definition, official features are mentioned in the product description. 

Let: 

F =<ft,f2>—,fn > be a set of features in the product description. 

T =< t lt t 2 , ... ,t m > be a set of terms in the reviews of the target product. 

S be the set of all sentences in the target product reviews. 

Si £ S contains a subset of T. 

Si is an official feature if t ; - = f t E F. 

If tjis an official feature, it will be removed from T. Therefore, after the official 
feature detection step, T becomes T 'which only contains terms that are not official 
features. 

3.3.2.2 Unofficial feature detection 

All the terms left in T are noun and noun phrases that are not official features. 
However, not all noun and noun phrases in T are unofficial features of the product. The 
feature pruning process must be done to eliminate terms that are unlikely to be product 
features. In prior research, multiple pruning steps were used to extract only those features 
that appear frequently enough (Hu et al., 2004). Although this approach has been 
successful in detecting many features, it might also ignore a many features which do not 
appear frequently. For smaller datasets, like the one used for this study, it is necessary to 
try to detect all the possible features even when they appear just a few times in the 

reviews. 
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After filtering all the official features, the remaining nouns and noun phrases are 
possible unofficial features. However, many of these terms are not really unofficial 
features. A feature pruning process is used detect the non-feature terms. The feature 
pruning process has two stages. In the first stage, the product description of the other 
products in the same category as the target product is used to identify phrases that are 
likely to be unofficial features. If a term is product feature, even though it is not 
mentioned in the description of the target product, is likely to appear in the description of 
other products in the same category. 

With all tj G T', if at least k product brands contain tj in their product reviews, 
term tj will be go to stage two of the pruning process, k is an arbitrary parameter. In this 
study, k = 5 (10% of the total quantity of brand names). This value is reasonable in the 
category of MP3 players because these players share many common features. In other 
product categories in which a feature is not supported by many brand names, the value of 
k should be reduced. If a term is included in the description of products of 5 different 
brand names, it will have a good chance of being classified as a product feature. Quantity 
of brand names is counted instead of quantity of products because many products have 
identical or nearly identical descriptions. If a term appears in a description of many 
products, it doesn't necessarily means that it is a feature. Therefore, brand name is a 
stronger measure for feature popularity. 

The second stage of the feature pruning process eliminates extremely popular 

terms which are not product features. For example, although the word "friend" might be 

included in the description of the products of 5 or more brand names, it is not a product 

feature. To make sure that a term is not an extremely popular term, we count its 
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occurrences in the reviews of all other products of the categories. If a term appears in 
more than p reviews, it is classified as extremely popular term and eliminated from the 
unofficial feature list, p is an integer number. In this study, if an unofficial feature 
appears in more than 5% of the reviews, it is considered as extremely popular. 

3.3.2.3 Performance 

The description-based feature classification method was used to detect and 
classify the features in the reviews of the target product. The review set included 60 
positive shill reviews and 93 positive normal reviews. To measure the performance of the 
description-based feature classification method, the reviews were manually read and 
features were tagged. Then, results of the automatic method were compared to the 
manual classification results. Recall, precision, and harmonic mean (F) were used as 
measures of performance. 

TP TP precision ■ recall 

recall = — — — ; precision = — — — — ; F — 2 



TP + FN ' TP + FP ' precision + recall 

Table 3.4 contains the total quantity of features identified by human tagger. There 

are 3058 nouns and noun phrases in the review set. 1822 of them are product-related 

terms. 82.39% of the features mentioned in the reviews are official features. Table 3.5 

shows that the performance of the description-based feature detection and classification 

method is very promising. After the step 1, official feature detection, 1589 terms were 

classified as product features. Because this step just detects official features, not all the 

features, the recall is very low. The overall precision in step 1 is high because the tasks of 

detecting official features automatically and manually using the human tagger are very 

similar. Both look for terms mentioned in the product description. 
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Table 3.4 Result of manual classification 



Quantity of manual features 



# of features/# of terms 



Official 



Unofficial 



1822/3058 



1511 



311 



Table 3.5 Performance of DFEM 



Step 1: Official features 
detection 


Quantity 


Recall 


Precision 


Harmonic mean (F) 


1589 


0.83 


0.96 


0.89 


Step 2: Unofficial features 
detection 


Quantity 


Recall 


Precision 


Harmonic mean (F) 


1980 


0.96 


0.86 


0.91 


Step 3: Popular term 
elimination 


Quantity 


Recall 


Precision 


Harmonic mean (F) 


1870 


0.94 


0.91 


0.92 



Step 2 detected 391 additional product related terms. Since many of the unofficial 
product features are included in the newly detected terms, the recall was significantly 
improved from 0.83 to 0.96. However, many of the detected unofficial terms are not 
actual unofficial terms, reducing the precision from 0.96 to 0.86. Step 3 reduced the 
number of false positive unofficial features detected, providing a recall of 0.94 and 
precision of 0.91. The results demonstrate that DFEM is an effective method for 
detecting both official and unofficial features in product reviews. 

3.3.3 Readability 

Readability has been used in prior studies to predict the usefulness of customer 
reviews (Korfiatis, 2008; O'Mahony and Smyth, 2010). Five popular readability 
measures are used in this study, namely the Fog-Index (Gunning, 1969), the Flesch 
Reading Ease Index (Flesch, 1951), the Automated reading test index, the Coleman-Liau 
Index (Coleman and Liau, 1975) and the SMOG Index (Laughlin, 1969). 
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3.3.3.1 The Fog-Index 

Developed in the 40s, the Fog-Index was used to measure the readability of 
newspaper writing. It measures how well an individual with average high school 
education can read an evaluated piece of text. The value range of the Fog index is from 1 
to 12. Lower Fog-index means more readable text. The Fog index of each review can be 
calculated as follows: 



(Words /N (complex words) 
7 +100X =— - 

Sentence \ N (words) 



» 



Where: 

• complex_word: word with three syllables or more 

3.3.3.2 The Flesch-Kincaid index (FK) 

The Flesch-Kincaid or Flesch Reading Ease index is used to identify the number 
of years of education needed to understand a piece of text. The FK index calculation is 
based on syllables per words and words per sentence. The value of this index is from to 
100 with smaller scores indicating less readable text. Text content with FK index higher 
than 60 can be understood by almost everyone. Advanced content such as Harvard Law 
Review has scores in the 30s indicating a level understood by law school students. The 
FRE index of each review can be calculated as follows: 

/ N(words) \ /N(syllables)\ 

FK = 206.835 - 1.015 X — J —) + 84.6 X -f-f — - 

\N (sentences) I \ N (words) I 
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3.3.3.3 The Automated Readability Index 

This index is simpler than the other two indexes. The calculation of this index 
uses the quantity of characters (excluding standard punctuation such as hyphens and 
semicolons) per word to measure of the readability of the text. The AR index ranges from 
1-12 indicating the grade level to understand the text. For example, AR = 5 requires a 
fifth grade education to understand the review. The AR index can be calculated as follow: 



(N(characters)\ / N (words) \ 

N (words) I ' \N (sentences)/ 



ARI = 4.71 X — ^- — — + 0.5 X I — r^ — ) - 21.43 

\ N (words) I 



3.3.3.4 Simple Measure of Gobbledygook (SMOG) 

Simple Measure of Gobbledygook is a readability measuring method proposed by 
(Laughlin, 1969). SMOG is widely used, especially in health documents. The main 
component of SMOG method is polysyllables defined as words with 3 or more syllables. 
The formula is a regression of the interaction between the length of words and sentences. 
SMOG result also ranges from 1-12. SMOG is calculated as follows: 



quantity of polysyllables 

SMOG = 1.043 |30 X ^ + 3.1291 

quantity of sentences 



3.5 Subjectivity Analysis 

The purpose of subjectivity analysis is to classify sentences in a text as subjective 
or objective. Subjectivity analysis is usually a step in a multiple step process to extract 
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the polarity 13 of a review (Pang and Lee, 2004a). Using movie reviews, Pang et al. 
(2004a) used subjectivity analysis to detect subjective sentences in the reviews. 
According to these authors, the polarity of the review can only be extracted from the 
subjective sentences. The subjectivity analysis approach of Pang and Lee was 
implemented in a software module included in the LingPipe 1 toolkit. This toolkit was 
used for subjective/objective sentence classification in this study. To classify the 
sentences, training data of labeled subjective and objective sentences must be obtained. 

One approach to automatically obtain labeled objective/subjective sentences is 
using product description and product reviews (Ghose et al., 2004). Objective sentences 
are extracted from the product description page and subjective sentences are extracted in 
the product reviews. About 3800 objective sentences were retrieved from the product 
description page on Amazon.com. More than 200 thousands sentences were collected 
from the product reviews. To create two datasets with the same size, 3800 sentences were 
randomly selected from the subjective sentences dataset above. 90 percent of each dataset 
were used in the training dataset and the remaining 10 percent sentences were used in the 
testing dataset. Table 3.6 shows the confusion matrix of the subjectivity classifier. 

Table 3.6 Subjectivity classifier confusion matrix 



Reference 





Response 




Objective 


Subjective 


Objective 


399 


10 


Subjective 


14 


386 



13 



According to Pang and Lee, 2004 [92], the term polarity is used to indicate the 
sentiment of the review: positive or negative 
http://alias-i.com/lingpipe/demos/tutorial/lm/read-me.html 
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3.4 Results and discussion 



3.4.1 Results 



The empirical results answer many questions about the differences between shill 
and normal reviews. Shill reviews and normal reviews are compared based on 
informativeness, subjectivity and readability. Informativeness reflects knowledge of 
reviewers about the products they are reviewing. Subjectivity shows personal 
assessments of the reviewers while readability is commonly used as linguistic cues to 
detect text deception. 

The above measures of shill and normal reviews are compared using one-tailed 
independent T tests. According to (Ruxton and Neuhauser, 2010), a one-tailed hypothesis 
test is justified when only one direction has meaning and evidence of a difference in the 
other direction is treated identically to non-rejection of a two-tailed test. Each hypothesis 
indicates a direction consistent with a one-tailed test. In addition, non-rejection of the null 
hypothesis will be treated the same as evidence of a difference in the opposite direction. 
Non rejection essentially means that the characteristic is not suitable to differentiate shill 
and normal reviews. For example, evidence that the quantity of official features in shill 
reviews is not larger than the quantity of official features in normal reviews indicates that 
this variable is not suitable to differentiate between shill and normal reviews. 

The results in 
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Table 3.7 show that shill reviewers concentrate on the official features included in 
the product description page. This conclusion is supported by the rejection of the null 
hypotheses along with the large effect sizes for HI a and Hlc about the quantity of 
official features and the percentage of sentences containing official features of shill 
reviews. The large effect sizes (1.019 and 0.966) provide evidence that shill reviews 
contain substantially more official features and percentage of sentences containing 
official features than normal reviews. While there is not enough evidence to detect a 
difference between the quantity of unofficial features per review of shill reviews and that 
of normal reviews, we have enough evidence to show the difference between the 
percentage of sentences that contain unofficial features in a normal review and that of a 
shill reviews. Although the negative effect size of -0.344 is small, it means that the 
percentage of sentences containing unofficial features per review of normal reviews is 
higher than that of shill review. 



36 



Table 3.7 Analysis results 



Hypothesis 


Measurement 


Source 


N 


Mean 


Std. 
Dev. 


Std. 
Err. 


p-Value 
one-tailed 


Effect 
size 
if) 


Hla 


Official Feature 
Quantity per 
sentence 


Shill 
Normal 


61 
93 


14.59 
6.47 


8.57 
7.84 


1.100 
.813 


0.000 


1.019 


Hlb 


Unofficial 
Feature Quantity 
per sentence 


Shill 
Normal 


61 
93 


3.20 

3.72 


2.72 
4.84 


.348 
.504 


0.099 


-0.185 


Hlc 


% Sentence 
containing an 
official feature 
per review 


Shill 
Normal 


61 
93 


0.81 
0.60 


0.16 
0.25 


.021 
.025 


0.000 


0.966 


Hid 


% Sentence 
containing an 
unofficial 
feature per 
review 


Shill 
Normal 


61 
93 


0.16 

0.22 


0.13 
0.20 


.016 
.021 


0.018 


-0.344 


H2 


Flesch-Kincaid 
Reading Ease 


Shill 
Normal 


61 
93 


71.82 
77.80 


10.52 
12.99 


1.347 
1.347 


0.001 


-0.498 


Gunning Fog 
Index 


Shill 
Normal 


61 
93 


11.18 
10.24 


2.70 
4.08 


0.346 

0.423 


0.056 


0.263 


Automatic 

Readability 

Index 


Shill 
Normal 


61 
93 


6.88 
5.59 


2.93 
4.61 


0.376 
0.478 


0.018 


0.320 


Coleman-Liau 
Index 


Shill 
Normal 


61 
93 


8.82 
7.60 


1.59 
2.15 


0.204 
0.223 


0.000 


0.630 


SMOG Index 


Shill 
Normal 


61 
93 


6.88 

5.55 


1.85 
2.39 


0.237 
0.248 


0.000 


0.604 


H3 


% Subjective 
sentence in the 
review 


Shill 
Normal 


61 
93 


0.68 
0.93 


0.26 
0.14 


0.033 
0.014 


0.000 


-1.312 
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These results show that hypothesis HI is weakly supported. Despite this, we 
strongly believe that product feature is an important variable that can be used to separate 
shill reviews from normal reviews. Mentioning official features frequently does not mean 
that the review is informative, especially when the reviewer is simply repeating the 
information found in the product description. It could mean that the shill reviewer is 
trying to convince consumers that they know a lot about the product. Other evidence of 
this effect is that our result shows that 100% of the unofficial features mentioned in shill 
reviews are also mentioned in normal reviews. These unofficial features are usually 
popular nouns (i.e. "size" and "user") which are not mentioned in the product description. 
Meanwhile, there are unofficial features which only the normal reviews discuss. The 
weakness of shill reviewers is that they don't have real product experience to know about 
the existence of these unofficial features. 

There is enough evidence to reject the null hypothesis about the difference in 
percentage of subjective sentences in a normal and shill reviews. The large negative 
effect size means that the percentage of subjective sentences in normal reviews is 
substantially larger than that in shill reviews. In other words, normal reviewers tend to 
express their personal opinions about the product in their product reviews, while shill 
reviews describe the features of the product instead of giving their personal opinions 
about it. Thus, hypothesis 2 is strongly supported 

The results indicate that all readability measures except the Gunning Fox Index 
show sufficient evidence to reject the null hypothesis of no difference between shill and 
normal reviews except for The Gunning Fog Index. The effect size of the Coleman-Liau, 

SMOG Indexes and the Flesch-Kincaid Reading Ease is medium while the effect size of 
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the Automatic Readability Index is small. This result provides evidence to support 
hypothesis H3. Shill reviews are more difficult to read than normal reviews. 

3.4.2 Discussion 

The occurrence of review manipulation has the potential to undermine the 
effectiveness of review systems. A successful shill attack might trick consumers into 
buying a low quality product and damage sales of competing products. An unsuccessful 
shill attack (e.g. shill reviews are detected by consumers) might result in losing trust in 
review systems and driving consumers from the marketplace. Therefore, a powerful shill 
review detection method is essential for online marketplaces moving forward. The results 
of this study indicate that official features, readability and subjectivity of the reviews are 
reliable factors to separate shill reviews from normal reviews. These factors can be added 
to current methods to empower their ability to detect shill reviews. Effective shill review 
detection mechanisms help gain consumer trust in review systems and maintain a fair 
marketplace. 

Product reviews have become an important resource for both consumers and 

sellers on online marketplaces. For consumers, product reviews provide an information 

channel, different from the ads of the sellers, about the product features and their quality. 

Product using experience of the reviewers helps the consumers make an informed 

purchase decision (Bounie et al., 2005). With promising performance, the product feature 

extraction method proposed in this study might increase the benefits that reviews bring to 

the consumers and sellers. An effective product feature extraction method can enhance 

the performance of existing review summarization methods making it easier for 

consumers to read reviews. 
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For sellers, reviews not only increase product sales but also provide useful 
information about the product features the consumers discuss (Cui et al., 2010; Ye et al., 
2009). Manufacturers can gather feedback from their consumers by extracting the product 
features in reviews. The feature extraction method introduced in this study can detect 
official and unofficial features separately. The ability to detect official features provides 
product manufacturers with valuable consumer opinions on the product. In addition, 
comments on unofficial features might bring useful knowledge about the features that the 
consumers care about but not included in the product descriptions. This knowledge can 
be helpful in product marketing or improving the quality of the product. 
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4. The Impact of Shill Reviews on Perceived Quality 

4.1 Theoretical background 

Perceived quality is an important variable in marketing research (Zeithaml, 1988). 
Perceived quality not only influences the behaviors of the consumers but also provides 
manufacturers the information about what consumers think about their products. 
Zeithaml (1988) presents a model with factors that impact perceived quality. In the 
Zeithaml 's model, the components that have the effect on perceived quality are 
reputation, an abstract dimension, and perceived monetary price. These components are 
called "the perception of lower level attributes". For perceived monetary price, studies 
show that consumers don't always remember the price of the item but encode the price in 
a way that is meaningful or easy-to-remember to them (Dickson and Sawyer, 1985; 
Jacoby et al., 1978). Instead of remembering the exact price of an MP3 player which is 
$47.84, a shopper may encode it as "low or high" or "affordable or expensive". It is the 
perception of lower level attributes that has the direct impact on perceived quality. 

With the emergence of review systems, a new factor that impacts perceived 
quality is product reviews. While advertising provides product information from the 
manufacturer's perspective, product reviews provide product information from the 
product user's perspective. According to a report of Neilson Company in 2009 2 , online 
opinions are more trusted than most forms of advertising. Li and Hitt (2008) found that 
prior to buying an experience good, product quality expectation of the consumers can be 



1 http://blog.nielsen.com/nielsenwire/consumer/global-advertising-consumers-trust-real- 
friends-and-virtual-strangers-the-most/ 
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affected by product reviews. Therefore, we expect that positive shill reviews can impact 
perceived product quality. 



Brand name 



Level of advertising 






/Abstract Dimension^ 



Objective Price 




Perceived Quality 



Extrinsic attributes 
Intrinsic attributes 



Perception of lower 
level attributes 



Higher-level 
abstractions 



/Perceived monetary/ 
z price y 



Figure 4.1 Factors that impact perceived quality 

Three metrics, valence, volume and variance, are frequently used to measure the 
impact of the rating summary on the first impression of product quality. Although the 
appearance of positive shill reviews in rating summary is not apparent, all three metrics 
are affected by the rating of positive shill reviews. Since positive shill reviews usually 
have high rating (Mukherjee, Liu and Glance, 2012), the appearance of shill reviews in 
the review set will increase the valence of the product. The volume increases when shill 
reviews are added. The ratings of shill reviews also increase the variance of product 
ratings because only negatively rated products need help from shills. So the ratings of 
shill reviews must be in the opposite direction of other reviews in the review set (Wu et 
al., 2010). 
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Valence is usually represented by the average rating measure (Clemons et al., 
2006; Dellarocas and Narayan, 2006; Dellarocas et al., 2007). The average rating is the 
overall assessment of the reviewers towards the product. Consumers use average rating to 
compare among products with the average rating serving as a proxy of product quality 
(Cui et al., 2010; Forman et al., 2008). A product with a better average rating can be 
considered a better product. Agreeing with this argument, Moe et al. (2011) stated that "a 
'good' product is likely to experience higher sales and receive more positive ratings than 
a 'bad' product" (Moe et al., 201 1). 

Volume is usually measured by the quantity of reviews. Volume has been found 
to impact product sales (Awad et al., 2006; Duan et al., 2008; Liu, 2006). One reason for 
this effect is that volume of reviews shows the level of discussion about the product 
which, can help increase the awareness among consumers (Cui et al., 2010). The volume 
of ratings received is one measure to estimate the size of the group of consumers who 
have bought and used the product. A larger group means more people have used the 
product regardless of the ratings they provide. 

Variance is measured using the statistical variance of the ratings. Typically, 
variance is available to consumers in a rating distribution chart. Variance has also been 
found to have a significant effect on product sales. Variance represents the disagreement 
among the reviewers about a product (Awad et al., 2006). High disagreement among the 
reviewers means different users perceive product quality differently. A large variance 
does not necessarily mean that the product is good product. Instead, variance in rating 
may just signal that the product is suitable for a portion of consumers and less suitable for 
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others. Sun (2008) found that when the average rating of the product is low, more 
variance helped to increase profit. 

Based on the prior research related to valence, volume and variance, we 
hypothesize: 

H4a: The valence of product ratings positively impacts perceived product quality. 
H4b: The volume of product ratings positively impacts perceived product quality. 
H4c: The influence of variance on perceived product quality is affected by average 
rating. 

Consumer behavior is affected by risk perception because any action may have 
unanticipated consequences (Bauer, 1960). Bauer (1960) revealed that perceived risk is 
associated with consumer's data acquisition process both before and after purchase 
decision. According to (Lutz and Reilly, 1974), when product risk perception is high, 
consumers tend to collect more information about the product. The more information is 
collected, the less unknown problems about the product are found. Since word-of-mouth 
has an effect on perceived risk (Ross, 1975), we argue that word-of-mouth can influence 
the data acquisition process. In this study, the data acquisition process is represented by 
the review usage of consumers. 

Usage of actual review comments can also be related to the valence, volume and 
variance of reviews. Research shows that average rating is used as a proxy of product 
quality (Cui et al., 2010; Forman et al., 2008). Thus, better ratings can reduce perceived 
product risk. Similarly, a large variance shows disagreement among the reviewers which 
can lead to higher perceived risk (Awad et al., 2006). Review usage can also be related to 
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the number of reviews available to read. Past research has shown that consumers read no 
more than two pages of reviews (Pavlou and Dimoka, 2006). Since thinly reviewed 
products have small number of reviews, volume dictates the quantity of reviews available 
for the consumers to read. Thus, we hypothesize that: 

H5a: The valence of product ratings positively impacts the total quantity of reviews read. 

H5b: The valence of product ratings positively impacts the median time spent on reading 

reviews. 

H6a: The volume of product ratings positively impacts the total quantity of reviews read. 

H6b: The volume of product ratings positively impacts the median time spent on reading 

reviews. 

H7a: The variance of product ratings positively impacts the total quantity of reviews 

read. 

H7b: The variance of product ratings positively impacts the median time spent on 

reading reviews. 

Prior to product purchases, consumers purchasing online usually don't have 
physical contact with products. Thus, consumers tend to look for additional product 
information from previous product users. By reading the reviews, consumers might 
collect some new information which is not available by using the rating summary or in 
the official product description. The new information might change the consumer's first 
impression about the quality of the product. As an example, an experiment compared one 
group of participants that read positive reviews of a film with another group of 
participants that read negative reviews of the same film (Wyatt and Badger, 1984). After 

viewing the films, the participants were asked to evaluate the films. The results showed 
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that direction of the reviews significantly impacted the direction of the evaluation. In 
practice, negative reviews have been found to impact product sales (Cui et al., 2010; 
Dellarocas et al., 2007; Forman et al., 2008; Moe et al., 2011). One explanation for this 
effect is that negative reviews negatively impact perceived product quality which, in turn, 
negatively impacts the buying decision of the consumers (Buttle, 1998). We hypothesize 
that: 

H8a: Total quantity of negative normal reviews read negatively changes the first 
impression about product quality. 

H8b: The median time spent on negative normal reviews negatively changes the first 
impression about product quality. 

As shown in (Wyatt et al., 1984), if positive reviews are read, they can impact 
consumer perception about product quality. The reason is that consumers read positive 
reviews to strengthen their beliefs about the quality of products (Moe, 2009). The rating 
of positive shill reviews is usually 5 stars or 4 stars (Mukherjee et al., 2012). Because the 
overall rating of a product is calculated as the simple average of its ratings (J0sang and 
Ismail, 2002), shill reviews with high ratings increase the overall average rating of the 
target product, especially thinly reviewed products. In this study, shill reviews are 
injected when the product is negatively rated. In such a situation, positive shill reviews 
might have better chance to be read because the subjects seek opposing opinions. Thus, 
we hypothesize that: 

H9a: Total quantity of shill reviews read positively changes the first impression about 
product quality. 
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H9b: The time spent on reading the shill reviews positively changes the first impression 
about product quality. 

Table 4.1 Summary of the hypotheses 



Group 




Content 


Number 


Independent variable 


Direction of 
impact 


Dependent variable 


First impression 


H4a 


Valence 


Positive 


Perceived quality 


H4b 


Volume 


Positive 


Perceived quality 


H4c 


Variance 


High valence: 
negative 
Low valence: 
positive 


Perceived quality 


Review usage 


H5a 


Valence 


Positive 


Qty of reviews read 


H5b 


Valence 


Positive 


Median time spent 


H6a 


Volume 


Positive 


Qty of reviews read 


H6b 


Volume 


Positive 


Median time spent 


H7a 


Variance 


Positive 


Qty of reviews read 


H7b 


Variance 


Positive 


Median time spent 


Change in 
perceived 
product quality 


H8a 


Qty of negative 
normal reviews read 


Negative 


Change in perceived 
quality 


H8b 


Median time spent on 
each negative 
normal reviews 


Negative 


Change in perceived 
quality 


H9a 


Qty of positive shill 

reviews 

read 


Positive 


Change in perceived 
quality 


H9b 


Median time spent on 
each positive shill 
reviews 


Positive 


Change in perceived 
quality 
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4.2 Models 

Three linear regression models were used to test the hypotheses (Figure 4.2). 
Model 1 answers the research question about the impact of the rating summary on the 
first impression a consumer may have about a product. In model 1, the average rating 
positively impacts perceived quality and also influences the effect of review variance on 
perceived quality. If the average rating is high, more variance is expected to negatively 
impact perceived quality. If the average rating is low, more variance may positively 
impact perceived quality. 

In Figure 4.2, Model 2 overlaps with both Model 1 and Model 3 in the way that 
the independent variables of Model 2 are also the independent variables of Model 1 and 
the dependent variables of Model 2 are the independent variables of Model 3. In Model 1 
and 2, average rating, volume and variance are the factors that impact both the first 
impression about product quality and review usage. Average ratings, variance and 
volume are all expected to have positive effect on review usage. Review usage, then, will 
change the first impression about product quality because the content of the reviews 
provides more detailed information about the product. 

Model 3 addresses the change in product quality perception once the content of 
the reviews is read. The review usage is measured by two variables: total quantity of 
reviews read and median time spent on each review. The review usage of both shill 
reviews and normal reviews is observed separately in order to measure the effect of both 
types of reviews on the change of quality perception. 
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H4a (+) 




(Model 1) 



First impression 



(Model 2) 



The usage of 
negative normal 



reviews 



(Model 3) 



H8(-) 



Perceived quality 



The usage of 

positive shill 

reviews 



H9(+) 



Model 1: 



Figure 4.2 Relationship of the models 



PQi = <x + a^AveRatingi + a 2 VolRatingi + a 3 VarRating t * AveRatingi + £j 
Where: 

PQi : The first impression of product i 



AveRating = 



Y lq l 1=1 NormalRating ql + X l 2=1 ShillRating q2 



nj+nf 



-J±J 



VolRating = n] +n 
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Var Rating 

n i n ? 

\Lai=i(NormalRating ql — Ave Rating) 2 + Y 1Q l 2 = 1 (ShillRating q2 — Ave Rating) 2 



nj+n 2 



n\: Quantity of normal reviews in the review set of product i 
nf : Quantity of shill reviews in the review set of product i 
Model 2: 

TQi = a + a x AveRating± + a 2 VolRatingi + a 3 VarRatingi + s t 
MT t = (3 + PtAveRatingi + p 2 VolRatingi + p 3 VarRating t + s t 
Where: 
TQi is the total number of reviews of product i read by the consumer. 
MTi is the median time the consumer spends on the shill reviews of product i. 
Model 3: 

APQ = PQl - PQf =p Q + p t TQ? + p 2 MT t N + (3 3 TQ? + faMT? + e t 
Where: 
PQl is the perceived quality of product i after the reviews are read. 
APQ is the difference in perceived quality before and after the reviews are read. 
TQf is the total of normal reviews of product i read by the consumer. 
MT^ is the median time the consumer spends on the normal reviews of product 
TQf is the total of shill reviews of product i read by the consumer. 
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MT t is the median time the consumer spends on the shill reviews of product /. 

4.3 The experiment 

Testing the impact of shill reviews on perceived quality requires both shill 
reviews and normal reviews to be gathered and shown to the subjects. Shill reviews were 
submitted by shills who have an undisclosed relationship with the seller. A normal review 
is free of undisclosed relationships between seller and reviewer unlike shill reviews. The 
collected shill and normal reviews were mixed together to create different review sets. 
The same product was shown to the subjects along with one of these review sets. 
Different review sets with different shill and normal review combinations allowed us to 
measure the impact of the reviews on perceived quality. Figure 4.3 illustrates the steps of 
the experiment. 
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Collect positive shill reviews 




Collect negative normal reviews 


1 








Create the review set of the 

product by randomly mixing the 

shill and normal reviews 










V 




Show the participants the 

average rating, quantity and 

distribution of the reviews 





Survey the product perceived 

quality 



Show the participants the rating, 
title and content of all reviews 



Monitor the review reading 
behavior 



Survey the product perceived 

quality 



Compare the difference 



Figure 4.3 The experiment 

The experiment simulates the situation in which a seller wants to dishonestly 
promote a thinly reviewed product. In the experiment, an MP3 player was shown to the 
subjects with basic product technical information along with the overall rating 
information such as average rating, quantity of reviews and the distribution of the 
reviews. The review set of the product was a random mix between positive shill reviews 
and negative normal reviews. With the available information, the subjects were asked to 
give their opinion about their perception of the product quality. This response could not 
be changed after submitted. Then, the subjects were shown the rating, title and text 
comment of all the reviews. The review usage of the subjects was recorded. Finally, the 

subjects were asked about the product quality perception again. The comparison of the 
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product quality perceptions before and after the reviews were read indicated the impact of 
the review content on product quality perception. 

Perceived quality is a multidimensional construct which is unobservable, context 
dependent and difficult to measure (Zeithaml, 1988). Accurate measurement of perceived 
quality involves identification of specific quality dimensions and careful justification of 
the validity of the dimensions (Parasuraman, Zeithaml and Berry, 1985). Instead of 
multidimensional measurement, unidimensional scale has been used to measure quality 
(Moorthy and Zhao, 2000; Tsiotsou, 2005; Zeithaml, 1988). The problem with using a 
unidimensional scale to measure perceived quality is the difficulty to interpret results. A 
unidimensional scale cannot provide detailed information about specific quality 
dimensions associated with respondent ratings. Because there are too many product 
quality dimensions mentioned in review content, it is difficult to design a multi- 
dimensional scale which measures all of these dimensions. Therefore, unidimensional 
scale is selected to measure perceived quality in this study (See Appendix B). 

The review usage is measured in two dimensions: quantity of reviews read and 
median time spent on each review. The total quantity of reviews read provides the 
quantity of each type of reviews (e.g. normal and shill) read by the consumers. To isolate 
the impact of shill reviews, we have to determine that shill reviews were read. Median 
time spent on a review measures the reading effort of the consumers. 

To recruit the subjects for this experiment, a probability sample of undergraduates 
and graduate students was used. Students are an appropriate population for this study 
because they are young and familiar with the internet and online shopping. In addition, 

students are active technology users, especially MP3 players. 6000 invitation emails were 
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randomly sent to 17,000 students. To comply with the requirement of probability 
sampling, every student in the list had the same probability of receiving the invitation 
email. The quantity of students who participated in the experiment was 175. This sample 
size is reasonable according to the analysis in section 4.4.1. 

4.4 Results and discussion 



4.4.1 Multicollinearity testing and sample size 

The independent variables of all three models were tested for multicollinearity. 
The scatter plots were used to test for multicollinearity. Figure 4.4 suggests that there is 
no multicollinearity among the independent variables in all three models. If 
multicollinearity occurs between two variables a linear pattern should appears on the 
scatter plot. 
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Figure 4.4 The scatter plots of independent variables 

There are two approaches that we can take to test the models: stepwise and 
confirmatory specification. Stepwise approach allows us to find the best set of the 
predictor variables (Hair, Black, Babin, Anderson and Tatham, 2005). However, stepwise 

approach might drop variables of interest which we really want to know about their 
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effect. It is equally important to know if a particular factor is important or not. For 
example, the result of this estimation helps determine if the quantity of positive shill 
reviews read has a positive impact on perceived quality. 

Since the purpose of the estimation process is not to look for the best model but to 
confirm the role of the variables of interest, we don't use the stepwise approach. Instead, 
we take the confirmatory specification approach to test the models because this approach 
helps us answer the research questions. T-test is used to assess the significant level of the 
coefficients. To assess the overall model fit, we use adjusted R . 

Since this research involves multiple t-tests, the problem of multiple comparisons 
must be addressed. The multiple comparisons problem involves two error rates 
(www.statistics.com 16 ), the comparison-wise rate applying to individual t-tests and the 
family-wise rate applying to the entire set of t-tests. The family-wise rate indicates the 
probability for making at least one type I error when conducting the experiment. So, even 
though the comparison error rate is satisfied, the family-wise error rate may not be 
acceptable. The experimenter determines one rate and the other rate is determined by the 
multiple comparisons technique (Huberty and Morris, 1989). In our experiment, we set 
the family-wise error to a FW = .05. According to (McClave, Benson and Sincich, 2008), 
there are three widely used techniques to address the problem of multiple comparison: 
Bonferroni, Scheffe and Tukey. We selected the Bonferroni corrections method in this 
experiment because we do pair-wise comparisons and our samples have unequal sizes. 

Table 4.2 shows the results of the Bonfferoni correction method. 



1 www . statistics .com is the official website of the Institute for Statistics Education 
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Table 4.2 Family-wise and comparison-wise Type I error rate 



Model apw a cw 


1 


0.05 


0.017 


2 


0.05 


0.017 


3 


0.05 


0.010 



According to (Hair et al., 2005), factors that might impact on the statistical power 
are the effect size, the type I error rate (a) and sample size. Larger effect size is more 
likely to result in higher statistical power. Since type I and type II errors are inversely 
related, reducing alpha will increase the probability of a type II error which then decrease 
the power. A larger sample size also increases the power of the test. Following the 
guidelines suggested in (Cohen, 1998), our objective for the power level of the models is 
at least 0.80 and the type I error rate shouldn't be larger than 0.05. In this study, we use 
Cohen's F 2 to measure the effect size. According to (Cohen, 1998), the effect size 
measured by Cohen's F 2 can be divided into three levels: small if = .10), medium 
(/" = .25) and large (/" = .40). 

Table 4.3 shows the minimum sample size required with power=0.8 and at 
different level of effect sizes. So in case the effect size is small, we set alpha at 0.01 with 
a sample of size of 174. 
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Table 4.3 Minimum required sample size 






alpha (<x)=0.05 


alpha (<x)=0.01 




Quantity of independent variables 


Quantity of independent 
variables 


Effect size 


3 


4 


3 


4 


Small (0.10) 


112 


124 


160 


174 


Medium (0.25) 


48 


53 


68 


74 


Large (0.40) 


32 


35 


45 
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Another approach to identify the sample size is to use the rule of thumb suggested 
in (Hair et al., 2005). The ideal sample size for multiple regression models should follow 
the 20 observations to 1 independent variable ratio. Because the most complicated model 
in our study contains 4 independent variables, we need a sample with at least 80 data 
points. However, we will want to collect as much data as possible because the size of the 
sample has a direct impact on the statistical power of linear regression model. The most 
difficult challenge of getting a desired sample is the availability of resource for the 
incentive. To collect the required sample, we gave a $5 gift card per complete 
submission. 

4.4.2 Results 

The empirical results answer the questions about the influence of product reviews 
on perceived quality and the effectiveness of shill reviews on changing consumer quality 
perception. Three regression models were employed to estimate the impact of variables in 
rating summary on the first impression and review usage and the impact of review usage 
on the difference in final perceived quality and the first impression. 
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We are concerned about the correlation between variance and the average rating 
in the review sets because each review set is the result of mixing the reviews from two 
groups of reviews with opposite rating values. The positive shill review group consists of 
only reviews with high ratings. The negative normal reviews group consists of only low 
rating reviews. Therefore, when the average rating is approaching the medium value (e.g. 
3-star) the variance is larger. This concern is partly addressed by using 4- and 5-star 
reviews in the high rating review group and 1- and 2-star reviews in the low rating review 
group. 

As indicated in These results are similar to the conclusions of Hu et al. (2012) 
that volume and variance are not reliable factors to predict perceive product quality. 

Table 4.4, there is a significant relationship between average rating and a 
consumer's first impression about the quality of the product. The effect size of the model 
is large. This result supports the statement by Forman et al. (2008) saying that the product 
rating can be used as a proxy for product quality. The results suggest that one additional 
star in the average rating can increase perceived quality rating by 0.72. This situation is 
ideal for shill attacks because the direct result of a positive shill review is an improved 
average rating of the product. Therefore, if consumers use only the average rating to 
assess the quality of the product, shill attacks can be very effective in making them think 
that the quality of the product is good. The effects of variance and volume on the first 
impression were not statistically significant. This makes senses because unlike average 
rating which directly reflects product quality, volume and variance only imply the size of 
the population of product users and the agreement among reviews about the product. 
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These results are similar to the conclusions of Hu et al. (2012) that volume and variance 
are not reliable factors to predict perceive product quality. 

Table 4.4 The impact of shill reviews on the first impression 



Model 


Coefficients 


Significance 






Constant 


1.935 


0.302 


0.000 


AveRating 


0.720 


0.074 


0.000 


VarRating 


-0.007 


0.059 


0.909 


VolRating 


-0.039 


0.031 


0.211 


• Dependent variable: The first impression 

• Adjusted R-square: 0.353 

• Significance of model: 0.000 

• Effect size: °- 572 



The results from Table 4.5 show that only volume has a statistically significant 
impact on the total quantity of reviews read by the consumers. This result makes sense 
because volume limits the quantity of the reviews that are available for the consumers to 
read. Furthermore, in this study, the maximum quantity of reviews for each product was 
10. Because this quantity is small, the subjects might read entire review set regardless of 
the variance and the average rating of the reviews. 

Table 4.5 Factors that impacts quantity of reviews read 



Model 


Coefficients 


Significance 






Constant 


-0.165 


0.462 


0.722 


AveRating 


0.147 


0.113 


0.194 


VarRating 


0.113 


0.090 


0.211 


VolRating 


0.792 


0.048 


0.000 


• Dependent variable: Quantity of reviews read 

• Adjusted R-square 0.643 

• Significance of model 0.000 

• Effect size L849 
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Table 4.6 indicates that only volume impacts median time spent on each review. 
The effect size of this model is very small. One can interpret from the results that when 
more reviews are available, the subjects spend less time on reading the reviews. In 
combination with previous results, we can conclude that when the quantity of reviews is 
small, consumers tend to read all the reviews, however, they spend less time reading each 
individual review. The data also supports this conclusion. The data shows that 82.43% of 
the participants read all reviews that are available. The subjects who read all of their 
reviews have 5.21 reviews in the review set while that number for the subjects who did 
not read all of the reviews have 6.44 reviews. However, the difference in quantity of 
reviews between the two groups of subjects is statistically insignificant (p-value = 0.066) 
with 95% confidence level. 



Table 4.6 Factors that impacts median time spent on each review 



Model 


Coefficients 


Significance 








Constant 


25.639 


3.347 


0.000 


AveRating 


0.248 


0.819 


0.763 


VarRating 


0.037 


0.650 


0.951 


VolRating 


-1.054 


0.346 


0.003 


• Dependent variable: 

• R-square 

• Significance of model 

• Effect size 


Median time spent 

0.041 

0.017 

0.060 





The subjects were asked to assess the quality of the product at two different 

periods: before and after they read the content of the reviews. After reading the reviews, 

66.9% of the subjects changed their first impression about the quality of the product. The 

results from Table 4.7 indicate that shill reviews have a statistically significant impact on 

the change of quality perception while normal reviews do not. Reading an additional shill 

review increased the perceived quality rating by 0.095. The results also suggest that one 
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additional second spent on reading a shill review increases the perceived quality rating of 
the consumers by 0.012. Small adjusted-R and effect size indicate that the usage of the 
reviews only explains a small part of the change in consumer quality perception. 
Additional will be necessary to find the major factors that influence the change in 
perceived quality. 

Table 4.7 Factors that impacts the change in perceived quality 



Model 


Coefficients 


Significance 




Constant 


-0.399 


0.212 


0.061 


t qS 


0.095 


0.041 


0.020 


MT* 


0.012 


0.005 


0.021 


TQ N 


-0.096 


0.049 


0.052 


MT N 


-0.003 


0.006 


0.626 


• Dependent variable: Change in perceived quality 

• Adjusted R-square 0.081 

• Significance of model 0.001 

• Effect size °- 114 



Further analysis also provides evidence about the effect of shill level on perceived 
quality. The shill level is measured as the percentage of shill reviews in the review set. 
Each subject is exposed to shill reviews at two different levels. At rating summary level, 
the shill level contains all shill reviews and normal reviews for the product. After the 
subjects read the reviews, the shill review level can be measured as the percentage of shill 
review actually read. The results in Table 4.8 show that both shill levels have a 
significant impact on perceived quality. If the initial shill level increases by 1%, the first 
impression perceived quality rating increases by 0.024. Once the subjects read the 
reviews, if the shill level based on reviews read increases by 1%, the final perceived 
quality rating increases by 0.034. 
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Table 4.8 The impact of shill level on perceived quality 



Model 


Coefficients 


Significance 


Effect size 
(f) 


Adiusted 
R 2 




Dependent variable: First impression 






Constant 


2.594 


0.163 


0.000 


0.46 


0.347 


Shill Level 


0.024 


0.002 


0.000 


Dependent variable: Final perceived quality 






Constant 


1.886 


0.161 


0.000 


1.15 


0.531 


Shill level 
of reviews 
read 


0.034 


0.002 


0.000 



4.4.3 Discussion 



The results suggest that shill reviews are more influential for thinly reviewed 
products because shill reviews have the greatest potential to improve overall product 
ratings for products with fewer normal reviews. In addition, the effect of shill reviews 
was even stronger after being read. One explanation for this result is that shill reviews 
usually are extreme which can be more persuasive to consumers than more neutral 
(unbiased) reviews (Mukherjee et al., 2012). Another reason for shill reviews to have 
such an impact is that consumers are unaware that a particular review is actually a shill 
review. These findings should raise the awareness about the danger of review 
manipulation. Online marketplaces should pay more attention to developing effective 
shill review detection methods to protect both consumers and honest sellers. 

The empirical results show that consumers use the average rating as an indicator 
of product quality. This result is consistent with prior research that suggests that average 
product ratings are used as the proxy of product quality. Therefore, maintaining a good 
product rating is very important to the long-term success of products. Regardless of the 
review quantity, products with a better average rating create a better first impression 
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about product quality. However, the overall review quantity does influence the quantity 
of reviews read by consumers. When the quantity of reviews is relatively small (e.g. less 
than or equal to 10), consumers tend to read all the reviews. 
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5. Conclusions 

In this chapter, the content of the previous chapters are summarized and the 
implications of the findings, limitations and future work are discussed. 

5.1 Summary of chapters 

Chapter 1 introduced the problem of product reputation manipulation. Product 
reputation is manipulated using shill reviews. The objectives of this study are to explore 
the linguistic characteristics of shill reviews and to measure the impact of shill reviews on 
perceived product quality. 

Chapter 2 reviewed the literature about the effect of product reviews on consumer 
behavior, the approaches on shill review detection and deterring methods, the methods to 
extract product features from product reviews and the concept of perceived quality. This 
study explores the linguistic characteristics of shill reviews and measures the impact of 
positive shill reviews on perceived product quality. These two aspects were not fully 
addressed in the literature. 

Chapter 3 addressed questions about the linguistic characteristics of shill reviews. 
To reveal these characteristics, shill reviews are compared to normal reviews using 
measures of informativeness, subjectivity and readability. Informativeness is measured 
based on the quantity of official and unofficial product features included in the reviews. 
Product features are detected and classified by a novel approach called Description-based 
feature extraction method. Subjectivity is measured using the well-known subjectivity 
extraction model proposed by Pang et al. (2004b). Readability is measured using five 
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readability indexes widely used in studies about online reviews. The above measures of 
shill and normal reviews are compared using one-tailed independent samples T-tests. 

Chapter 4 measured the impact of shill reviews on perceived product quality. In 
an experiment, consumer quality perceptions are measured: when consumers have only 
seen the rating summary and when consumers have read the content of the reviews. At 
both occasions, a quantity of shill reviews are injected into product's review sets. The 
measures of perceived quality are then compared to reveal the impact of reading shill 
reviews on perceived product quality. 

5.2 Results and implications 

5.2.1 Linguistic characteristics of shill reviews 

Shill reviews are an emerging problem of online review systems. To effectively 
detect shill reviews, it is important to distinguish them from the normal reviews. The 
objective of this study was to explore the characteristics of shill reviews by comparing 
them with the normal reviews and create a novel method to perform product feature 
detection and classification. Unlike previous studies which try to detect shill reviews 
from publicly available reviews, this study collects shill reviews via a data collection 
procedure. Official and unofficial features of the product were extracted from the reviews 
using a description-based feature extraction method. Having a wide the variety of 
unofficial features included in a review indicates the knowledge of the reviewer about the 
product. 

Our results suggest significant differences between shill reviews and normal 

reviews in terms of informativeness, readability and subjectivity level. The shill reviews 
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are less readable than normal reviews. The content of shill reviews is usually repetitive 
and long because shill reviewers try to mention as many features as possible in the 
reviews. The repetitive content of the shill reviews focused mainly on the official features 
included in the product description. This is not surprising as the product description is the 
main source of information used by shill reviewers. In addition, unlike normal reviewers 
who usually use subjective statements to express their personal opinion about the 
product, shill reviewers use more objective sentences similar to ones in the product 
description. This finding demonstrates that normal reviewers personally used the product 
and were confident in judging the product. In contrast, shill reviewers had no personal 
experiences using the product and just described the product features in their reviews. 

5.2.2 The impact of shill reviews on perceived quality 

Customer reviews play an important role in the success of online product sales. 
Our explanation for this relationship is that customer reviews impact perceptions of 
product quality which affects purchase decisions and product sales. To increase product 
sales, shill reviews are published to improve the reputation of the target product. The 
objective of this study is to understand the relationship between product reviews and 
quality perceptions and the role that shill reviews play in influencing perceived product 
quality. 

Collecting shill reviews via an experiment and normal reviews on Amazon.com, 

we compose multiple different review sets for an MP3 player. In another experiment, the 

same product with different review sets is shown to different groups of consumers. 

Perceptions of product quality are measured using a survey first when consumers only 

see the rating summary and then after consumers read the review content. Three linear 
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regression models are used to measure the impact of review rating summary on the first 
impression and review usage and the impact of consumer reading behaviors on the 
change in perceived product quality. 

The results show that product average rating affects the first impression and 
review quality influences total number of reviews read and median time spent on reading 
each review. However, not all reviews had the same effect on changes in consumer 
quality perceptions. Reading normal reviews didn't make the consumer change their 
mind about the quality of the product. However, when a consumer read more shill 
reviews and spent more time reading shill reviews, his assessment about the quality of the 
product changed. The results show that when there are more shill reviews in the review 
set, perceived product quality increases. This result is true both before and after the 
consumers read the reviews. After reading the reviews, the effect of shill reviews was 
even stronger indicating that consumers were unable to detect that reviews were shill 
reviews and the shill reviews were successful in influencing the assessment about product 
quality. This finding is consistent with the conclusion by Jindal et al. (2008) who states 
that it is impossible to distinguish shill reviews from normal reviews even if they are 
manually read. 

The findings of this study have both theoretical and practical contributions. 
Theoretically, we provided evidence that word-of-mouth, in an online shopping 
environment, can be used as an important indicator of perceived product quality. 
Practically, marketers can use online review rating as a measure for perceived product 
quality. Online reviews can also be used as a tool to improve consumers perceptions of 
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quality, especially in cases where advertising doesn't effectively do so (Clark et al., 
2009). 

Our approach is different from the previous studies in several ways. First, shill 
reviews were submitted by real shills. Unlike other studies that collected publically 
available reviews from online marketplaces and classify them as shill reviews, this study 
collected shill reviews in an experiment in which the subjects posed as shills. Second, 
most of the previous studies focus on the sales of the product while our focus is on 
perceived product quality. We explain the reason behind the association between 
customer reviews and sales is because of the mediating effect of perceived product 
quality. We extend the Zeithaml framework by identifying the relationship between 
customer reviews and perceived quality. Third, we determined the connection between 
the overall reputation information and the review usage which were not addressed in 
previous studies. It is feasible to explore this relationship because the experiment setting 
allows us to monitor the review usage. Finally, while other research indicated awareness 
of shill reviews, the effect of shill reviews on consumers was unknown. In this study, we 
measured magnitude of the impact of the shill reviews. 
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5.3 Limitations and future research 

5.3.1 Negative shill reviews and shill review detection 

The first part of this study has several limitations. First, although the sample size 
meets the requirements for the analysis method used, it is still relatively small. Our 
results might be more convincing in a larger sample. Second, we only collected positive 
shill reviews while negative shill reviews are also very interesting to analyze. In a market 
with few competitors, damaging the reputation of the competitors might result in 
increasing sales of a target product. So the incentive for submitting negative shill reviews 
can also be very high. Third, our method is limited to comparing the characteristics of 
shill and normal reviews. Future research is necessary to address these limitations and to 
extend this approach such that it can become a robust method to detect shill reviews. 

5.3.2 The impact of shill reviews on product preference 

The second part of this study also has a number of limitations. First, only positive 
shill reviews are considered while there are also negative shill reviews, especially in the 
market where the competition is narrow. Second, although prize money was offered to 
improve the quality of shill reviews, the quality of the shill reviews is not verifiable. 
Third, only one product was used in the experiment. To generalize the results, different 
product categories should be used. Fourth, the population is convenience population. 
Finally, we only collected reviews from hired shill reviewers who never used the product. 
In real life, the shill reviews can come from anyone including the author of the book or 
the manufacturer of the television. A shill review written by the author of a book may 
differ from a shill review submitted by someone who never read that book. 
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This research can be extended in multiple directions. Further work can be done to 
measure effect of shill reviews on perceived quality when other factors such as prices and 
product features are not fixed. In such a situation, it is interesting to see if shill reviews 
are powerful enough to make consumers change their product preference. Another future 
direction for this research is to measure how the order of appearance changes the 
effectiveness of shill reviews because higher reviews on the list have a better chance of 
being read. We also want to look into the characteristics and effect of negative shill 
reviews in a future study. Negative shill reviews can be effective, especially in case 
where limited quantity of products is being offered and the consumers don't many 
products to choose. Finally, indirect review manipulation is an important area to 
investigate. Indirect review manipulation occurs when shill reviews are rated as helpful 
by the shills themselves. The answer to the question about how highly rated shill reviews 
impact perceived quality is another promising area for future study. 
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Appendix 



A. The website for shill collection 



This website has 3 sections 



Free Store 



An Online Electronic Marketplace 



Product Review 



Contact Info 

Full Name: 



Email Address: 



Professor full name: 

Terms & Instructions 



Your name will NOT be displayed 

• Please use your UCDENVER email address. 

* Your email address will NOT be shared with anyone (strick rule). 

• Please make sure that you type your email address correctly. 

* We will use this email address to communicate with you about the reward for 
writting the review. 

Who is the professor of your MGMT-3000 or BUSN-G520 course? 



The review must be a positive review whose content should positively change the consumer's perception about the product quality. 

Please don't look for the product information on the Internet. 

You don't need to be familiar with the product or the brand to write the review. 

There's no limit of the length of the reviews. 



Picture 1 Contact information and Term & Instructions 

• The students are asked to provide full Name and class info and email 

address. The class information is used to identify which class the student 
attends for the purpose of allocating course credit. The student must use 
@ ucdenver.edu email address to participate in this study. The purpose of 
the email address is to uniquely identify the student and communicate 
about the monetary reward. 

• Even though we already change the brand name, the product name and the 
model number of the product, we specifically ask to student not to look for 
the product on the internet. 



80 



• We ask the reviewer to intentionally submit a positive review for the 
product even though they might never use this product before. There's no 
instruction about what the content of the review should be. 



Product 



Tero 4GB MP3 Player with touchscreen 




Product Specifications 

Features 

° Touch Screen Mp3 player 

° Music, Video, Photos, FM Radio, Text, Record 

n Built-in-Speaker 

-- Headphones (Awesome Sound) 

a Computer USB interface 
Technical Information 

8 Brand name: Tero 

° Model: PWC1Q-534X 

° Digital Storage Capacity: 4 GB 

« Display: 2.8" TFT Big Screen [320 x 240 Pixel} 

« E-BOOK function (supports TXT, IRC files) 

« Supported Audio Format: Supporting MIDI, MP3, WMA, WMV, ASF, MTV and WAV 
formats 

* Video Playback Format: AVI, VQB, DAT, MPEG and 3GP 
; Supported Image Format BMP, JPEG 

* Connect to Computer: USB 2.0 (Full Speed) 

» Power Supply: Bulit-in Rechargeable Battery (Approx. 8 hrs of Playing Time) 

-- Digital Voice Recording Up to 5 Hours 

1 Built-in FM Radio Channels 

-- Multi-languages Support 



D I 



Picture 2 Product information 

• Two pictures of the product 



The product specifications. 



No price is shown 
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Review 


Rate this product: 






Review Title 
Comment 


(max: 128 characters) 
















Submit | 





Picture 3 The review 

• The reviewers can rate the product (1 star - 5 star) 

• The length of the review title is limited to 128 characters. This limitation 
is similar to Amazon.com review title length requirement. 

• The length of the text comment is unlimited. 
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B. The product evaluation website 



This website has 5 sections. 




An Online Electronic Marketplace 



Contact Info 
Full Name: 



Please use your UCDENVER email address 



Email Address: * Your email address will NOT be shared with anyone 

* Please make sure that you type your email address correctly 

Terms & Instructions (Please read carefully) 

• The reviews below are real-life reviews whose content was not verified. 

• Please use only the information provided in this webpage to make your decision. Don't use information from other places. 

• You can only participate in this experiment one time. 
■ Please complete the survey. 

Picture 4 Contact information 

• Each student can participate in this experiment only one time. Every 

student at the University of Colorado Denver has one unique 
@ ucdenver.edu email address. We use this email address to limit the 
participation of the students. 

• The participants are not told that there are shill reviews in the review set. 
However, they are informed that the content of the reviews was not 
verified. 

• The participants are specifically asked not to look for the product 
information somewhere else during the participation of this experiment 
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Product Information 




Product Specifications 

Features 

= Touch Screen Mp3 player 

- Music, Video, Phctcs, FM Radio, Text, Record 

= Built-in-Speaker 

: Headphones (Awesome Sound) 
= Computer USB interface 
Technical Information 
= Brand name: Te.ro 

Model: PWC10-584X 

Digital Storage Capacity: 4 GB 

Display: 2.8" TFT Big Screen (320 x 240 Pixel) 

E-BOOK function (supports TXT r IRC files) 

Supported Audio Format: Supporting MIDI, MP3, WMA, WMV, ASF, MTV and WAV 

formats 

Video Playback Format: AVI, VOB, DAT, MPEG and 3GP 

Supported Image Format: BMP, JPEG 

Connect to Computer: USB 2.0 (Full Speed) 

Power Supply: Bulit-in Rechargeable Battery (Approx. S rirs of Playing Time) 

Digital Voice Recording Up to 5 Hours 

Built-in FM Radio Channels 

Multi-languages Support 
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Picture 5 Product information 

• We control for the factors that might impact product quality perception of 

the consumer. 

o Brand name: the brand name was changed 
o Advertising: 

■ Two pictures of the product 

■ The product specifications. No additional advertising 
information available. 

o Price: 

■ No price is shown. 
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Summary Product Rating 



Average rating 


irki 


(2.6) 


Quantity of reviews 


10 reviews 


Rating distribution 


Type 


Quantity 


5-Star 
4-Star 

3-Star 
2-Star 

1-Star 


1(1) 

1(3) 

(0) 

1(3) 
H(3) 



By using the summary product rating only, how would you evaluate the quality of this MP3 player? 

Very low Neutral Very high 

12 3 4 5 6 7 

o e © © © e e 



Continue (1/3) 



Picture 6 The overall reputation information 

• In this section, the participants are shown the overall reputation 

information of the product: 
o Average rating (valence) 
o Quantity of reviews (volume) 
o Rating distribution (variance). 

• With only the overall reputation information, the participants are asked to 
give their opinion about the quality of the product. 

• At this moment, the text content of the review is hidden. The participants 
must answer the question about product quality perception before moving 
on to the next section. 
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Full reviews 



Please click on the title to expand the review 

Rate the helpfulness of each review you read 

Click continue after you are done reading the reviews. After you click the Continue button, you can't interact with the reviews 

anymore. 



Pretty Solid Product 



This review is: Not helpful O O Very helpful 

This Mpi) player is great. I like that it is touch screen and that it does picture and video as well as music. 4 GB is a fairly descent amount of memory for my needs 
providing that I am primarily using it for music. I don't know how much video I would watch, probably not very much. I do know that video takes up a lot space so 
that would be the only downfall of this Mp3 player, which is why I gave it 4 stars instead of 5. The fact that is comes with headphones is nice because I don't have 
a whole lot of money to spend on the Mp3 player as well as extra headphones so that is a great value for me. lam bi-lingual and sometimes! prefer my various 
devices to be in Spanish so the Multi-language option is perfect. The screen is not super huge, but I prefer a smaller device that fits better into my pocket, 
especially when I'm working out. I imagine that I would be using this Mp3 player in the gym 85% of the time if not more so this works well for me. Overall I really 
like this product and would recommend it to others. 



Everything; | could ever wan: in a phone and more! 



unhappy with product 

Touch Screen Pyrus MP3 Player 



Picture 7 Full review 

• To expand a review the participant must click on it title. 

• Once a review is expanded, we clock the time the participant reads that 
review. The clock of that review stop once the participant expand another 
review or done reading by moving on to the next section. 

• After reading the review, the participant must rate its helpfulness. 

• The participant must read at least one review. The participants are not 
required to read all the reviews. 



The order of appearance of the reviews is random. 



After reading the reviews, how would you evaluate the quality of this MP3 player? 

Very low Neutral Very high 

12 3 4 5 6 7 

O ©00© 8 O 

Continue (3/3) 



Picture 8 Product quality evaluation after reading the reviews 
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• The participants are asked to evaluate the product quality one more time after 
reading the reviews. 

• Then, the participants must answer all the questions in a short survey before 
finishing their participation. 



Survey 



1. What is your gender? 

*-* Female 

^ Male 

2. How old are you? 

E 21 and Under 

E 22 to 34 

E 35 to 44 

E 45 to 54 

E 55 to 64 

E 65 and Over 

3. What is your total annual house whole income? 

Less than $10,000 

$10,000-$19,999 

$20,000-$29,999 

$30,000-$39,999 

$40,000-$49,999 

$50,000-$59,999 

$60,000-$69,999 

$70,000-$79,999 

$80,000-$89,999 

$90,000-$100,000 

More than $100,000 



□ 
□ 
□ 
□ 
□ 
□ 
□ 
□ 
□ 
□ 



4. What is your highest education level? 



□ 
□ 
□ 
□ 



High School Diploma 
Bachelors' Degree 
Master Degree 



PhD Degree 
5. If you are an undergraduate or graduate student, what is your current level? 

^ None 

" Freshman 
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*-* Sophomore 

o , • 

■*■ Junior 

□ c • 

*-* Senior 

^ Master 

n PhD 

6. How would you describe your English reading ability? 

12 3 4 5 6 7 
Poor □ □ □ □ □ □ □ Excellent 

7. On average, how many hours a week you use the internet? 

■-* to 5 hours/week 
■-* 6 to 10 hours/week 
*-* 11 to 20 hours/week 

*=■ More than 20 hours/week 

8. How often do you use the Internet for shopping? 

12 3 4 5 6 7 

Never □□00000 ^ 

often 

9. How often do you use product reviews prior to making an online purchase? 

12 3 4 5 6 7 

Not at all O O O O O O O ^ ry 

often 

10. On average, how many product reviews do you read before making a purchase? 



□ 


None 


u 


1 to 2 reviews 


u 


3 to 4 reviews 


u 


5 to 6 reviews 


u 


7 to 8 reviews 


u 


9 to 10 reviews 


u 


More than 10 reviews 



11. Online customer review is an important source of information about product quality 

Extremely,,. Slightly „, .Slightly . Strongly 

1 Disagree ,. ' Neutral 3 J Agree 3 J 

disagree disagree agree agree 

□ □□□□□□ 

12. Positive reviews are important for my perception about product quality 

Extremely^. Slightly „, .Slightly . Strongly 

1 Disagree ,. ' Neutral 3 ' Agree a ' 

disagree disagree agree agree 

□ □□□□□□ 

13. Negative reviews are important for my perception about product quality 

Extremely Disagree Slightly Neutral Slightly Agree Strongly 
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disagree disagree agree agree 




□ n □ n □ n □ 




14. Shopping online is risky 




Extremely,,. Slightly „, .Slightly . Strongly 
1 Disagree ,. ' Neutral 3 ' Agree 3 ' 
disagree disagree agree agree 




□ n n n □ n n 




15. When shopping online, the risk of not getting what you pay for is 


high 


Extremely^. Slightly „, .Slightly . Strongly 
1 Disagree ,. 3 Neutral 3 ' Agree a ' 
disagree disagree agree agree 




□ n n n n n n 
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